Software Projects - Exploratory Data Analysis

Lacrimae rerum. Memento mori. Memento vivere.

3-Factor Asset Pricing Analysis

Based on their rational definitions, factors can be seen as proxies for the characteristics of equities and other assets which explain performance and provide premiums due to a relative risk. With the Fama-French 3-Factor Model for asset pricing, these risk factors consider aspects of market beta, market capitalization, and book-to-market equity (operating profitability, change in investment assets, and recent performance momentum were excluded from this analysis). The data is accessed from the online library provided by Kenneth French, which highlights returns relevant to the research into asset pricing models from Eugene Fama and Kenneth French. As accessing the data through the online library is segmented by type, the overall data was iteratively collected and stored as variables in a PKL file or as sheets in an XLSX file for all relevant types. An exploratory analysis was performed to show the distributions, time-varying characteristics, and interactions affecting realized returns of the data. The primary packages used in the project include Python with Numpy, Pandas, Matplotlib, Seaborn, Urllib, and Pickle.

https://mba.tuck.dartmouth.edu/pages/faculty/ken.french/data_library.html

Data Considerations

In the construction of the data, the returns are in USD and include dividends and capital gains for the period without fees or taxes and without continuous compounding (unless specified as annualized). The return from market beta is equal to the difference in return between a value-weighted market portfolio and the 1-month U.S. Treasury bill as the risk-free rate. For the value factor (High Minus Low), the portfolios are sorted into 2 groups for market capitalization (with the upper 90% of equities with the highest and lower 10% of equities with the lowest market capitalization) and 3 groups respectively for book-to-market equity with breakpoints at the 30th and 70th percentiles. The return from the size factor (Small Minus Big) is the average return considering the equally-weighted groups which were formed using the value factor.

Definition used in the data for market beta relative to the risk-free rate:

\begin{gather*} \text{Mkt-Rf} = \text{Value-Weighted Market Portfolio} - \text{Risk-Free Rate} \end{gather*}

Definition used in the data for the size factor and categorized as Small Minus Big:

\begin{gather*} \text{SMB} = \frac{1}{3} (\text{Small Value} + \text{Small Neutral} + \text{Small Growth}) - \frac{1}{3} (\text{Big Value} + \text{Big Neutral} + \text{Big Growth}) \\ \end{gather*}

Definition used in the data for the value factor and categorized as High Minus Low:

\begin{gather*} \text{HML} = \frac{1}{2} (\text{Small Value} + \text{Big Value}) - \frac{1}{2} (\text{Small Growth} + \text{Big Growth}) \end{gather*}

For this analysis, there is also consideration for portfolios specifically constructed to target various combinations of factors. This allows for evaluation of the intersection of independent portfolios with differing ranges of sorts between and within factors. For example, the sorts of portfolios can be formed by looking at the size factor with increasing degrees of market capitalization or value factor with increasing degrees of book-to-market equity. Similarly, the sort of portfolios can be formed by looking at the size factor and value factor using 2 groups for market capitalization and 3 groups for book-to-market equity to produce 6 portfolios ranging between equities with small market capitalizations, large market capitalizations, high book-to-market equity, neutral book-to-market equity, and low book-to-market equity.

With regard to the regions, countries are grouped based on their classification as developed markets or emerging markets (which generally follows classifications from MSCI) and relative location with the extent of the data varying based on availability. The developed markets include Australia, Austria, Belgium, Canada, Switzerland, Germany, Denmark, Spain, Finland, France, Great Britain, Greece, Hong Kong, Ireland, Italy, Japan, Netherlands, Norway, New Zealand, Portugal, Sweden, Singapore, and United States. The European regions include Austria, Belgium, Switzerland, Germany, Denmark, Spain, Finland, France, Great Britain, Greece, Ireland, Italy, Netherlands, Norway, Portugal, and Sweden. The Asia Pacific regions include Australia, Hong Kong, Japan, New Zealand, and Singapore. The emerging markets include Brazil, Chile, China, Colombia, Czech Republic, Egypt, Greece, Hungary, India, Indonesia, Malaysia, Mexico, Pakistan, Peru, Philippines, Poland, Qatar, Saudi Arabia, South Africa, South Korea, Taiwan, Thailand, Turkey, and United Arab Emirates.

Individual Factor Components

For the most diverse and broad evaluation, the yearly return can be calculated for developed markets and emerging markets. From this, it is clear that market beta, size factor, and value factor realized positive and significant premiums on average and consistently across each of the individual regions. As explored, there is an empirical reason based on higher risk (through higher discount rates) for an expected premium to be realized - if there was no empirical reason, it is more likely that any result may be due to data mining, random chance, or inconsistent effects, where there would be no genuine expectation for the premium to persist in the future. Interestingly, the size factor appears less reliable and weaker than the other factors and may not provide a robust premium across time (as is somewhat understandable, as the empirical case for the existence of the size factor is less established).

Yearly realized returns for the components of the Fama-French 3-Factor Model in developed markets:

❮❯

Yearly realized returns for the components of the Fama-French 3-Factor Model in emerging markets:

❮❯

Individual regions, including the United States, Europe, Japan, and Asia Pacific excluding Japan, can be considered to evaluate the pervasiveness of the factors. From this, it is clear that market beta, value factor, profitability factor, and investment factor realize positive and significant premiums on average and consistently across each of the individual regions. To highlight, the size factor in isolation often actually does not provide a robust premium across time.

Yearly realized returns for the components of the Fama-French 3-Factor Model in the United States:

❮❯

Yearly realized returns for the components of the Fama-French 3-Factor Model in Europe:

❮❯

Yearly realized returns for the components of the Fama-French 3-Factor Model in Japan:

❮❯

Yearly realized returns for the components of the Fama-French 3-Factor Model in Asia Pacific excluding Japan:

❮❯

As a measure of the association between the factors, the correlations can be considered relative to the yearly realized returns. These considerations can be related to Modern Portfolio Theory from Harry Markowitz, where it is asserted that diversification and reduced risk of losses can be achieved by minimizing the correlation of assets within a portfolio. In other words, between assets within a portfolio with positive expected returns, a perfect positive correlation increases the standard deviation of the portfolio, but an imperfect positive or negative correlation will always decrease the standard deviation. When constructing a portfolio, this allows for the optimization of the expected return against a certain level of risk (assuming standard deviation is a suitable proxy for risk). However, it should also be kept in mind that, due to changes between periods, these correlations may not necessarily be fixed and could be clustered based on different periods.

The correlations also reveal whether the factors are actually distinct in their definitions. In a sense, it would be expected for the factors to act as independent principal components, although they have been identified beforehand based on empirical reasoning. However, as there may be shared qualities in their definitions, there may be similarities in their results in different periods and they may not be completely disconnected. It should also be acknowledged that it would be expected for the correlations between factors to increase as the number of factors increase, as there is a finite spectrum of unique information which can be extracted and overlaps have to occur as more factors are added to imperfectly fill in the remaining gaps based on their definition. This is seen in the incremental but subsiding improvements between the Capital Asset Pricing Model, Fama-French 3-Factor Model, and Fama-French 5-Factor Model.

Correlations of yearly realized returns for the components of the Fama-French 3-Factor Model in various markets:

❮❯

Constructed Portfolio Sorts

With the available data in the United States, it is possible to construct univariate portfolios based on the individual factors (and market beta), where each portfolio has an increasing degree of exposure to the related factor. Using the relevant fundamental ratio, the equities are divided into the lower quintile of equities with the lowest values, middle quintiles of equities with the average values (considered to be core, blend, or neutral), and upper quintile of equities with the highest values. This allows for an ideal overview of the expected characteristics and performance of long-only portfolios based on the corresponding construction (although without consideration for fees or taxes and obstacles of implementation).

Yearly returns for univariate portfolios of size from the Fama-French 3-Factor Model in the United States:

❮❯

Yearly returns for univariate portfolios of value from the Fama-French 3-Factor Model in the United States:

❮❯

To identify interactions, portfolios can be constructed to target combinations of factors through twofold-variation sorts (with market beta being common to each portfolio). For size and value, the portfolios formed include the lower and upper halves of market capitalizations and lower 30%, middle 40%, and upper 30% of book-to-market equity.

An interesting finding is that, although the size factor in isolation often does not provide a robust premium across time in different regions, the magnitudes of other factors tend to be enhanced for equities with small market capitalizations. In other words and with regard to the value factor, there is a premium for equities with cheap valuations over equities with expensive valuations, but there is also a premium for equities with small market capitalizations and cheap valuations over equities with large market capitalizations and cheap valuations. It has been speculated with evidence that the lack of a premium from the size factor in isolation is largely due to the volatile performance of equities with small market capitalizations and low quality (which may be indirectly removed when considering the size factor in combination with other factors due to their very poor multiples). Thus, if considering the size factor, it is reasonable and optimal to account for quality for improvements in terms of its efficacy, robustness, consistency, and stability across time.

However, an aspect to keep in mind is that the exposure to factors may not necessarily be the same for related sorts given the sort by size. In other words, there would typically be more equities with small market capitalizations relative to equities with large market capitalizations and this allows for greater diversification in the characteristics of these equities - as a consequence, for example, a portfolio targeting the lower half of market capitalizations and upper third of book-to-market equity may have higher exposure to the value factor than a portfolio targeting the upper half of market capitalizations and upper third of book-to-market equity (differences in performance may be attributed to this difference in exposure (rather than the difference in size)). With equivalent characteristics, there is evidence that the size factor has no further impact.

Yearly returns for bivariate portfolios of size and value from the Fama-French 3-Factor Model in developed markets:

❮❯

Yearly returns for bivariate portfolios of size and value from the Fama-French 3-Factor Model in emerging markets:

❮❯

Yearly returns for bivariate portfolios of size and value from the Fama-French 3-Factor Model in the United States:

❮❯

Yearly returns for bivariate portfolios of size and value from the Fama-French 3-Factor Model in Europe:

❮❯

Yearly returns for bivariate portfolios of size and value from the Fama-French 3-Factor Model in Japan:

❮❯

Yearly returns for bivariate portfolios of size and value from the Fama-French 3-Factor Model in Asia Pacific exlucing Japan:

❮❯

Software Architecture Overview

The project was designed with an object-oriented approach using classes for each part. As the metadata class, the selection and reference of data is controlled through a CSV file specifying a title to assign for display, region for which the data is applicable, label to use as the variable name, URL from which to access the data, and indices of the relevant sections and columns within the data. With regard to these indices, they need to be manually assigned based on which sections should be extracted, as well as which columns should be extracted within the section, while labels are also required and form part of the data frames. Specifically, the ascribed columns include Title, Region, Label, URL, SetsIndex, SetsTitle, SetsLabel, ColumnsName, and ColumnsIndex (although purposefully designed to work with the online library provided by Kenneth French, the subsequent operations would work with any file of the same structure).

Example of a CSV detailing the information to generate an object using the metadata class:

				Title, Region, Label, URL, SetsIndex, SetsTitle, SetsLabel, ColumnsName, ColumnsIndex
				Three Factor Model, Developed Markets, three_factor_dv, https://mba.tuck.dartmouth.edu/pages/faculty/ken.french/ftp/Developed_3_Factors_CSV.zip, 1, Yearly Data, data_year, "Date, Mkt-RF, SMB, HML, RF", "0, 1, 2, 3, 4"
				Three Factor Model, Emerging Markets, three_factor_em, https://mba.tuck.dartmouth.edu/pages/faculty/ken.french/ftp/Emerging_5_Factors_CSV.zip, 1, Yearly Data, data_year, "Date, Mkt-RF, SMB, HML, RF", "0, 1, 2, 3, 6"
				Three Factor Model, United States, three_factor_us, https://mba.tuck.dartmouth.edu/pages/faculty/ken.french/ftp/F-F_Research_Data_Factors_CSV.zip, 1, Yearly Data, data_year, "Date, Mkt-RF, SMB, HML, RF", "0, 1, 2, 3, 4"
				Three Factor Model, Europe, three_factor_eu, https://mba.tuck.dartmouth.edu/pages/faculty/ken.french/ftp/Europe_3_Factors_CSV.zip, 1, Yearly Data, data_year, "Date, Mkt-RF, SMB, HML, RF", "0, 1, 2, 3, 4"
				Three Factor Model, Japan, three_factor_jp, https://mba.tuck.dartmouth.edu/pages/faculty/ken.french/ftp/Japan_3_Factors_CSV.zip, 1, Yearly Data, data_year, "Date, Mkt-RF, SMB, HML, RF", "0, 1, 2, 3, 4"
				Three Factor Model, Asia Pacific ex Japan, three_factor_as, https://mba.tuck.dartmouth.edu/pages/faculty/ken.french/ftp/Asia_Pacific_ex_Japan_3_Factors_CSV.zip, 1, Yearly Data, data_year, "Date, Mkt-RF, SMB, HML, RF", "0, 1, 2, 3, 4"
				Portfolios Size, United States, size_us, https://mba.tuck.dartmouth.edu/pages/faculty/ken.french/ftp/Portfolios_Formed_on_ME_CSV.zip, 2, Value-Weight Year, data_portfolio_year, "Date, Quintile 10%, Quintile 30%, Quintile 50%, Quintile 70%, Quintile 90%", "0, 5, 6, 7, 8, 9"
				Portfolios Value, United States, value_us, https://mba.tuck.dartmouth.edu/pages/faculty/ken.french/ftp/Portfolios_Formed_on_BE-ME_CSV.zip, 2, Value-Weight Year, data_portfolio_year, "Date, Quintile 10%, Quintile 30%, Quintile 50%, Quintile 70%, Quintile 90%", "0, 5, 6, 7, 8, 9"
				Portfolios Size-Value, Developed Markets, size_value_dv, https://mba.tuck.dartmouth.edu/pages/faculty/ken.french/ftp/Developed_6_Portfolios_ME_BE-ME_CSV.zip, 2, Value-Weight Year, data_portfolio_year, "Date, Small LoBM, Small Neutral, Small HiBM, Large LoBM, Large Neutral, Large HiBM", "0, 1, 2, 3, 4, 5, 6"
				Portfolios Size-Value, Emerging Markets, size_value_em, https://mba.tuck.dartmouth.edu/pages/faculty/ken.french/ftp/Emerging_Markets_6_Portfolios_ME_BE-ME_CSV.zip, 2, Value-Weight Year, data_portfolio_year, "Date, Small LoBM, Small Neutral, Small HiBM, Large LoBM, Large Neutral, Large HiBM", "0, 1, 2, 3, 4, 5, 6"
				Portfolios Size-Value, United States, size_value_us, https://mba.tuck.dartmouth.edu/pages/faculty/ken.french/ftp/6_Portfolios_2x3_CSV.zip, 2, Value-Weight Year, data_portfolio_year, "Date, Small LoBM, Small Neutral, Small HiBM, Large LoBM, Large Neutral, Large HiBM", "0, 1, 2, 3, 4, 5, 6"
				Portfolios Size-Value, Europe, size_value_eu, https://mba.tuck.dartmouth.edu/pages/faculty/ken.french/ftp/Europe_6_Portfolios_ME_BE-ME_CSV.zip, 2, Value-Weight Year, data_portfolio_year, "Date, Small LoBM, Small Neutral, Small HiBM, Large LoBM, Large Neutral, Large HiBM", "0, 1, 2, 3, 4, 5, 6"
				Portfolios Size-Value, Japan, size_value_jp, https://mba.tuck.dartmouth.edu/pages/faculty/ken.french/ftp/Japan_6_Portfolios_ME_BE-ME_CSV.zip, 2, Value-Weight Year, data_portfolio_year, "Date, Small LoBM, Small Neutral, Small HiBM, Large LoBM, Large Neutral, Large HiBM", "0, 1, 2, 3, 4, 5, 6"
				Portfolios Size-Value, Asia Pacific ex Japan, size_value_as, https://mba.tuck.dartmouth.edu/pages/faculty/ken.french/ftp/Asia_Pacific_ex_Japan_6_Portfolios_ME_BE-ME_CSV.zip, 2, Value-Weight Year, data_portfolio_year, "Date, Small LoBM, Small Neutral, Small HiBM, Large LoBM, Large Neutral, Large HiBM", "0, 1, 2, 3, 4, 5, 6"

The metadata class will separate each row in the CSV file as a source to be used. For each source, an object is created through the source class to handle the retrieval and extraction of the target data. An objected created as a source class will consist of identification properties from the associated metadata class, as well as the raw and processed data as dataframes. The raw data is simply the retrieved data formatted numerically with NaN values occupying incompatible records (downloaded version of the original data is also optionally stored). This results in each section being continuous sets of numeric values with groups of NaN values between them and allows for the simple identification of sections based on the alignment of these values. So, the processed data extracts the relevant sections and labels them based on the variable names from the metadata. The relevant columns for the corresponding sets are then simply selected in each of the extracted sets (in all cases, it is necessary to select the column for dates, which is formatted as datetime values).

Illustration of the original, raw, and processed data and transformations in each step:

For analysis, several plots are available depending on the type of the source. For the premiums from individual factors, it is possible to visualize the history of realized returns since inception with moving averages for various lengths of time; distribution of realized returns as a histogram and kernel density estimation with identification of several metrics (such as mean, median, maximum, minimum, standard deviation, skewness, and kurtosis); and association internally between the factors with the Pearson, Kendall, and Spearman correlation coefficients and scatter plots showing a linear regression model. For the portfolios constructed based on factors, it is possible to visualize the history of realized returns since inception with moving averages for various lengths of time; distribution of realized returns as a histogram and kernel density estimation with identification of several metrics (such as mean, median, maximum, minimum, standard deviation, skewness, and kurtosis); cumulative realized returns with vintages beginning from each point in time and progressing through time on a linear or log scale with comparison against fixed returns and identification of several metrics (such as average returns for various lengths of time, time until a cumulative return of 10 times, and time for drawdowns until a positive return); and annualized returns from the compound annual growth rate from each point in time and progressing through time with a heatmap to illustrate the impacts of events and identification of several metrics (such as dispersion of results for various lengths of time - although the results appear to converge, a small compounding difference can have drastically divergent outcomes over long periods of time, as seen with the variation of cumulative realized returns).

Finally, for organization, a collection class has been created to hold and manage the sources as properties. This allows for designated analysis based on the types of the sources (such as yearly data compared to monthly data or individual factor premiums compared to portfolio constructions), as well as saving the dataframes of the sources as separate sheets in an XLSX file - alternatively, it is possible to easily save the entire collection class as a PKL file through the Pickle module.

Latest public version available online through the Git repository hosted on GitLab: